192 research outputs found
Deep Anchored Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have been proven to be extremely
successful at solving computer vision tasks. State-of-the-art methods favor
such deep network architectures for its accuracy performance, with the cost of
having massive number of parameters and high weights redundancy. Previous works
have studied how to prune such CNNs weights. In this paper, we go to another
extreme and analyze the performance of a network stacked with a single
convolution kernel across layers, as well as other weights sharing techniques.
We name it Deep Anchored Convolutional Neural Network (DACNN). Sharing the same
kernel weights across layers allows to reduce the model size tremendously, more
precisely, the network is compressed in memory by a factor of L, where L is the
desired depth of the network, disregarding the fully connected layer for
prediction. The number of parameters in DACNN barely increases as the network
grows deeper, which allows us to build deep DACNNs without any concern about
memory costs. We also introduce a partial shared weights network (DACNN-mix) as
well as an easy-plug-in module, coined regulators, to boost the performance of
our architecture. We validated our idea on 3 datasets: CIFAR-10, CIFAR-100 and
SVHN. Our results show that we can save massive amounts of memory with our
model, while maintaining a high accuracy performance.Comment: This paper is accepted to 2019 IEEE/CVF Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW
Do Deep Neural Networks Suffer from Crowding?
Crowding is a visual effect suffered by humans, in which an object that can
be recognized in isolation can no longer be recognized when other objects,
called flankers, are placed close to it. In this work, we study the effect of
crowding in artificial Deep Neural Networks for object recognition. We analyze
both standard deep convolutional neural networks (DCNNs) as well as a new
version of DCNNs which is 1) multi-scale and 2) with size of the convolution
filters change depending on the eccentricity wrt to the center of fixation.
Such networks, that we call eccentricity-dependent, are a computational model
of the feedforward path of the primate visual cortex. Our results reveal that
the eccentricity-dependent model, trained on target objects in isolation, can
recognize such targets in the presence of flankers, if the targets are near the
center of the image, whereas DCNNs cannot. Also, for all tested networks, when
trained on targets in isolation, we find that recognition accuracy of the
networks decreases the closer the flankers are to the target and the more
flankers there are. We find that visual similarity between the target and
flankers also plays a role and that pooling in early layers of the network
leads to more crowding. Additionally, we show that incorporating the flankers
into the images of the training set does not improve performance with crowding.Comment: CBMM mem
Foveation-based Mechanisms Alleviate Adversarial Examples
We show that adversarial examples, i.e., the visually imperceptible
perturbations that result in Convolutional Neural Networks (CNNs) fail, can be
alleviated with a mechanism based on foveations---applying the CNN in different
image regions. To see this, first, we report results in ImageNet that lead to a
revision of the hypothesis that adversarial perturbations are a consequence of
CNNs acting as a linear classifier: CNNs act locally linearly to changes in the
image regions with objects recognized by the CNN, and in other regions the CNN
may act non-linearly. Then, we corroborate that when the neural responses are
linear, applying the foveation mechanism to the adversarial example tends to
significantly reduce the effect of the perturbation. This is because,
hypothetically, the CNNs for ImageNet are robust to changes of scale and
translation of the object produced by the foveation, but this property does not
generalize to transformations of the perturbation. As a result, the accuracy
after a foveation is almost the same as the accuracy of the CNN without the
adversarial perturbation, even if the adversarial perturbation is calculated
taking into account a foveation
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Despite the growing use of transformer models in computer vision, a
mechanistic understanding of these networks is still needed. This work
introduces a method to reverse-engineer Vision Transformers trained to solve
image classification tasks. Inspired by previous research in NLP, we
demonstrate how the inner representations at any level of the hierarchy can be
projected onto the learned class embedding space to uncover how these networks
build categorical representations for their predictions. We use our framework
to show how image tokens develop class-specific representations that depend on
attention mechanisms and contextual information, and give insights on how
self-attention and MLP layers differentially contribute to this categorical
composition. We additionally demonstrate that this method (1) can be used to
determine the parts of an image that would be important for detecting the class
of interest, and (2) exhibits significant advantages over traditional linear
probing approaches. Taken together, our results position our proposed framework
as a powerful tool for mechanistic interpretability and explainability
research.Comment: NeurIPS 202
DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Recently, the introduction of the generative adversarial network (GAN) and
its variants has enabled the generation of realistic synthetic samples, which
has been used for enlarging training sets. Previous work primarily focused on
data augmentation for semi-supervised and supervised tasks. In this paper, we
instead focus on unsupervised anomaly detection and propose a novel generative
data augmentation framework optimized for this task. In particular, we propose
to oversample infrequent normal samples - normal samples that occur with small
probability, e.g., rare normal events. We show that these samples are
responsible for false positives in anomaly detection. However, oversampling of
infrequent normal samples is challenging for real-world high-dimensional data
with multimodal distributions. To address this challenge, we propose to use a
GAN variant known as the adversarial autoencoder (AAE) to transform the
high-dimensional multimodal data distributions into low-dimensional unimodal
latent distributions with well-defined tail probability. Then, we
systematically oversample at the `edge' of the latent distributions to increase
the density of infrequent normal samples. We show that our oversampling
pipeline is a unified one: it is generally applicable to datasets with
different complex data distributions. To the best of our knowledge, our method
is the first data augmentation technique focused on improving performance in
unsupervised anomaly detection. We validate our method by demonstrating
consistent improvements across several real-world datasets.Comment: Published as a conference paper at ICDM 2018 (IEEE International
Conference on Data Mining
- …